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ABSTRACT 

We describe a Bayesian methodology to evaluate the consistency between 
the reported Ginga and BATSE detections of absorption features in gamma 
ray burst spectra. Currently no features have been detected by BATSE, 
but this methodology will still be applicable if and when such features are 
discovered. The Bayesian methodology permits the comparison of hypotheses 
regarding the two detectors' observations, and makes explicit the subjective 
aspects of our analysis (e.g., the quantification of our confidence in detector 
performance). We also present non-Bayesian consistency statistics. Based on 
preliminary calculations of line detectability we find that both the Bayesian 
and non-Bayesian techniques show that the BATSE and Ginga observations are 
consistent given our understanding of these detectors. 

Subject headings: gamma rays: bursts — methods: statistical 



1. INTRODUCTION 

The presence or absence of absorption lines in the gamma ray burst (GRB) spectra 
observed by the detectors of the Burst and Transient Source Experiment (BATSE) on 
board the Compton Gamma Ray Observatory (GRO) is one of the most pressing issues in 
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the BATSE study of GRBs. The absorption lines observed in the 15-75 keV band by earlier 
GRB instruments (Konus — Mazets et al. 1981; HEAO-1 — Hueter 1987; Ginga — Murakami 
et al. 1988) were interpreted as cyclotron absorption in a teragauss magnetic field (e.g., 
Wang et al. 1989), and as such, reinforced the identification of the GRB sources with 
Galactic neutron stars. Neutron stars are the only known astrophysical site of such strong 
fields. However, the observed locations and intensities of the BATSE bursts (Meegan et 
al. 1992) have undermined the Galactic neutron star paradigm for burst origin. The 
angular distribution is isotropic, yet the intensity distribution is inconsistent with a 
uniform, three-dimensional Euclidean source density. Therefore, we are at the center of a 
spherical source distribution which decreases radially, and not within a disk population. 
Consequently the search for absorption features in the BATSE spectra has taken on 
additional importance. 

No definitive lines have been discovered by the BATSE team thus far. We have 
reported this in shorter, less complete presentations (Teegarden et al. 1993; Band et al. 
1993a; Palmer et al. 1994a); in the current series of papers, beginning with Palmer (1994b), 
we describe our search and analysis methods in greater detail. Since this is an ongoing 
search our analysis methods and results will undoubtably change over the course of this 
series of publications. 

In the current report we present the methodology by which we compare the Ginga 
and BATSE detections and nondetections. While the presentation of this methodology 
is our primary objective here, we demonstrate the use of the resulting formulae with 
an approximation to the observed data, and therefore we draw relevant conclusions 
about BATSE- Ginga consistency which will most likely remain true for more accurate 
calculations. We derive the consistency statistics using a Bayesian formulation; however, the 
meaning of the resulting expressions does not require a detailed understanding of Bayesian 
methodology. Built on a Bayesian foundation, this analysis employs concepts which are 
formally not permitted in classical "frequentist" statistics (e.g., distributions for unknown 
parameter values, not just "random" variables), but nonetheless the derived formulae should 
be considered reasonable and therefore acceptable to the astrophysical community. In our 
derivation we describe the Bayesian concepts where they are applied, and note deviations 
from orthodox frequentist or Bayesian usage. We also present frequentist consistency 
statistics which we find lead to similar conclusions as the Bayesian formulae. 

One of the central tenants of Bayesian inference is that probabilities are measures 
of our confidence in the truth of propositions, rather than simply the frequency with 
which a result occurs (e.g., Loredo 1990). Therefore the probability that a hypothesis is 
true can be evaluated based on both prior quantitative information (and more qualitative 
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expectations) and new observations. This permits hypotheses to be compared by comparing 
their probabilities. In our case we ask: given the observations, what are the odds — the ratio 
of the hypothesis probabilities — that the BATSE and Ginga results are consistent compared 
to inconsistent? Below we will explicitly compare the consistency hypothesis (represented 
as proposition Hq) to various alternative hypotheses (propositions H x ); for example, H\ 
states that because of a detector defect, BATSE is unable to detect the absorption lines 
which are present. The observations may be very unlikely for both hypotheses (e.g., for 
results drawn from a continuum of possibilities), yet can still favor one over the other. 

The Bayesian formulation has a number of virtues. First, this approach permits us to 
frame the consistency calculation in terms of the observed distribution of detections and 
nondetections, and does not require assumptions about the population of results from which 
the observations may have been drawn. Second, while Bayesian inference has frequently 
been criticized for the apparent arbitrariness in quantifying prior expectations, this merely 
makes explicit an arbitrariness that is also present in "frequentist" methods. For example, 
the threshold which a frequentist statistic must exceed before we accept a conclusion is 
based on our expectation as to the likelihood of that conclusion: a conclusion contrary 
to our expectations requires a more extreme threshold. Third, the Bayesian formulation 
provides guidance as to the elimination of unknown parameters whose values are necessary 
for deciding the consistency question but are not intrinsically interesting for this issue (i.e., 
"marginalization" of "nuisance" parameters). In our case the frequency with which lines 
occur is the nuisance parameter. Finally, this methodology will continue to be applicable 
if and when we do detect cyclotron lines in BATSE bursts since it does not depend on any 
particular pattern of detections and nondetections. 

While absorption features have been reported by a number of instruments, for the 
purpose of quantitative comparison we need well documented details of both the detections 
and the bursts which were searched. Similarly, we consider only the statistically significant 
lines reported by other detectors. We evaluate the line significance with the F-test which 
compares fits to the spectrum of a continuum with and without lines. The F-test gives the 
probability that the decrease in \ 2 of the continuum plus line model versus the continuum 
model alone is due to chance when no line actually exists (Martin 1971, pp. 144-147), in 
other words, the probability that a fluctuation of the continuum would appear as significant 
as the observed feature. Note that a smaller probability indicates a more significant line. 
For the BATSE detectors we have established a detection threshold of an F-test probability 
less than P F < 10~ 4 ; the threshold has been chosen to eliminate spurious detections. Also, 
the spectra obtained by all detectors capable of observing the feature must be consistent. 
As we will show, currently the BATSE observations can be compared to only two Ginga 
detections. 
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Absorption features were reported to be present in ~ 20% of the Konus bursts (Mazets 
et al. 1981). However the Konus spectra were analyzed under the assumption that the 
continuum was oc E" 1 exp(— E/E ) (Mazets et al. 1982, 1983) while we find that low 
energy GRB spectra are best fit by a variety of predominantly flatter spectral models (Band 
et al. 1993b); we do not know whether the reported lines would be significant with a more 
realistic continuum. Finally, the significance of the line features and their detectability 
(the probabilities for actual and spurious detections) in all the Konus bursts have not been 
provided, precluding quantitative comparison. 

Two line detections have been reported among the 21 HEAO-1 bursts (Hueter 1987), 
one with a significance of 5.6 x 10~ 4 and the other with a significance of 3 x 10~ 3 . Neither 
of these line candidates would qualify as detections by our detection threshold. In addition, 
we have no information about line detectability in the ensemble of HEAO-1 bursts. 

The Ginga bursts provide the best documented detections. Four sets of lines in 
the Ginga bursts have been reported, but only two sets meet our detection criterion of 
P F < 10" 4 (the following line significances have been recalculated using the fit parameters 
in the indicated references). Thus the lines in the S2 segment of GB870303 (with a 
significance of 1.1 x 1CT 3 — Murakami et al. 1988) and in GB890929 (a significance of 
2.7 x 10~ 3 — Yoshida et al. 1991) cannot be considered detections. The harmonically 
spaced lines at 19.3 and 38.6 keV in GB880205 (2.4 x 10" 5 — Murakami et al. 1988), and 
the single line at 21.1 keV in the SI segment of GB870303 (1.5 x 10~ 7 — Graziani et al. 
1992) constitute the Ginga detections. Although the line in GB870303 is formally very 
significant, the low signal-to-background of the continuum and the small final x 2 (14.49 for 
30 degrees-of-freedom, P(x 2 < 14.49) = 7.5 x 10~ 3 ) make this feature suspect. 

Therefore we compare the BATSE nondetections to the Ginga detections. In the 
absence of a large enough ensemble of detections to characterize the distribution of line 
parameters, we use the two Ginga line detections to define two line types. In our example 
we calculate quantities for each Ginga line type separately, and also for the two types 
together. 

Previously (Band et al. 1993c) we defined the consistency statistic as the (non- 
Bayesian) probability of two or more detections in any of the Ginga bursts and none in 
the BATSE data, that is, of a result at least as discrepant as the observations. This 
probability is a function of the unknown line frequencies. Maximizing this statistic with 
respect to the line frequencies placed an upper limit on this probability of ~ 5%. However, 
this probability tells us how unlikely the pattern of detections and nondetections is, but 
does not directly inform us whether there is an inconsistency. As we discuss below, the 
observations may be even more unlikely under various hypotheses about inconsistencies 
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between the two instruments. This was the motivation for the adoption of a Bayesian 
analysis. Nonetheless, we also present non-Bayesian consistency statistics which we find 
lead to similar conclusions, at least for our example based on approximations to the data. 

Because of the small number of detections, we derive a likelihood function for discrete 
line types and only comment briefly on continuous distributions of line parameters (§2.1). 
This likelihood function is required by the Bayesian methodology (§2.2); the Bayesian 
formalism also provides estimates of the line frequencies which are both interesting in their 
own right and useful for the consistency analysis. Using this methodology we compare 
the consistency hypothesis to various hypotheses about possible sources of the apparent 
discrepancy between detectors (§3); these formulae are then applied to an illustrative 
example which approximates the observations (§4). For completeness we present our earlier 
frequentist consistency calculations (§5). The implications of these various consistency 
measures are discussed in §6, after which we summarize our conclusions (§7). We use the 
standard notation where p(a \ b) means the probability of proposition a given proposition b 
(a proposition may be a hypothesis, a model's validity or its parameter list). 



2. BAYESIAN CONSISTENCY PROBABILITY 

2.1. Likelihood Function 

The population of absorption features is characterized by distributions of line 
parameters such as energy centroids, equivalent widths, intrinsic widths, harmonics, etc. 
Undoubtably these parameters vary continuously. However, the number of detections is 
insufficient to determine these distributions. Instead of attempting to model the parameter 
distributions, we restrict the line population to the two line types defined by the Ginga 
detections. Therefore we develop our methodology for a finite number of line types, and 
only comment briefly on the continuum limit. First we develop the likelihood function, the 
probability of obtaining the data under a set of hypotheses (which include our understanding 
of the instruments), and then we embed this likelihood in a Bayesian framework. 

Assume there are n t line types, each defined by a set of parameters e p , where p denotes 
the line type (1 or 2 for the line population based on the Ginga detections). Let f p = f(e p ) 
be the frequency with which line type p (defined by e p ) occurs in bursts, regardless of 
whether the line is detectable or whether other line types are present. In the absence of 
information about different burst populations, we assume the entire burst population is 
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characterized by the same line type distribution / = /(e) (i.e., / is the set of all f p ); 
thus we do not assume that lines are evident only in long duration multispike bursts, for 
example. Note that we allow the existence of more than one line type in a burst. This is 
justified by the presence of the SI line at ~ 20 keV and the S2 lines at ~ 20 and ~ 40 keV 
in GB870303 (although the second set of lines is not significant enough to be considered a 
detection). Further we postulate that each line type is independent of the presence of all 
other line types. 

We represent the detection or nondetection of the line with parameters e p in the ith 
burst by the propositions Lj(e p ) and Lj(e p ), respectively, and the existence or absence of 
the line by k(e p ) and li(e p ), respectively. Our assessment of the detection probabilities 
depends on our understanding of detector responses, etc. We make this dependence explicit 
by including the proposition I, representing our knowledge of the observations, detector 
performance, etc., as one of the conditions in our expressions. Similarly, the probabilities 
will depend on the hypothesis H which we are evaluating. Finally, the probabilities are 
functions of the line frequency distribution /, which must be modeled if unknown (as is 
currently the case). 

Therefore we can express the probability of detecting a line as 

p(Li(e p )\fHI) = p{L i (e p )\l i {e p )fHI)p{l i {e p )\fHI) (1) 
+ p(L i (e p )|Z" i (e p )/^/)p(zi(e p )|/ff/) . 

The first term on the right is the probability for detecting real lines, while the second term 
is the probability for a false positive. We assume that the line types are distinct enough 
that one cannot be confused with another, otherwise p(L(e p ) \ l(e p ) f H I)p(l(e p ) \ fHI) must 
be replaced by X)"Li p(L(e p ) \ ^{^a) f H I)p{l{e a ) \ fHI). Clearly if we include more line types 
which are similar to each other this assumption that line confusion can be ignored is less 
justified. Since k(e p ) and k(e p ) on the one hand, and Lj(e p ) and Lj(e p ) on the other, are 
exhaustive (i.e., the sum of their probabilities equals 1), 

p(L, t (e p )\fHI) = a ip f p + (3 ip (l-f p ) (2) 
p(Li(e p )\fHl) = 1 - p (Li (e p ) \ fHI) = (1 - a ip ) f p + (l- ip ) (1 - f P ) , 

where a ip , the detection probability, and (3 ip , the probability of a spurious detection, must 
be calculated specifically for the ith burst and pth line type, and may depend on the 
hypothesis H under evaluation. We postpone the description of how we calculate a and (3 
to a later publication in this series. 

We are presented with a pattern of detections and nondetections of absorption lines 
in the BATSE and Ginga data, an observed realization from among all possible outcomes, 
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which we denote by the proposition D. Note that our Bayesian calculations focus on one 
particular realization from the universe of possible realizations. The global proposition D 
is the product of the propositions Di concerning the line detections and nondetections in 
individual bursts. Thus Di states that in the ith burst certain line types were detected, and 
all others were not detected. For example, with two line types Di = Lj(ei)Lj(e2) indicates 
that in the ith burst line type 1 was detected and line type 2 was not. 

If the detections of different line types are not coupled then the probability of observing 
D is just the product of the probabilities of each detection or nondetection as given by 
eqn. (2). The line types would be coupled if the presence of lines of different types were 
correlated or if line types could be confused; neither possibility is considered here. For 
bursts where na lines are detected the probability for the data given / (the likelihood for 
/) is 

p(Di\fHI) = IIM^WI-/,)] II [(1 - a*) /, + (1 - (1 - /,)] (3) 

(7=1 cr=rid + l 

- n (1 _ < ^;::^(L/.) nKi-^)/. + (i-ft>)a-/.)i 

where for clarity we number the detected line types first; a more complicated indexing is 
necessary when different bursts with line detections are considered. The second formulation 
in eqn. (3) is more compact and leads to useful limits. Note that in eqn. (3) the probability 
of the observed outcome Di is calculated for all possible combinations of the presence and 
absence of the line types; terms with a factor f a assume the ath line type is present and 
those with (1 — f a ) assume the line is absent. 

For an ensemble of Nq Ginga and Nb BATSE bursts which have been searched the 
likelihood, the probability for the observed realization, is 

N G N B 

p (D | fHI) = J] P (D k | fHI) Y[p(D m \ fHI) (4) 

k=l m=l 

which is valid even when the line types are coupled (i.e., if eqns. [1-3] are not valid). 

We write eqn. (4) one line type at a time for the current case of no = 2 and % = 
where we assume that there are only two line types. For clarity we place the line detection 
in the first Ginga burst; this equation can be extended easily to the second line type by 
reversing the definitions of the first and second bursts. The line frequency / and detection 
probabilities a and (5 now refer to the single line type under consideration, and will have 
different values for each line type. We assume that BATSE and Ginga observe the same 
populations of strong bursts and therefore their line frequencies should be the same, but 



- 8- 



for the purposes of the analysis below we write the likelihood in terms of separate line 
frequencies for each detector 

N G 

p(D\f G f B I) = («i/ G + A (l-/c))n -&(Wg)) x (5) 

k=2 

N B 

[] (l-a m f B -p m (l-f B )) 

m=l 

where line type indices have been suppressed. 

To make more concrete the dependencies on the numbers of Ginga and BATSE bursts 
in which lines could be detected, we present below simplified heuristic calculations in which 
we set a — 1; frequently we will also set (3 = 0. The numbers of Ginga and BATSE bursts 
must be reduced accordingly to compensate for the bursts in which lines could not be 
detected. Empirically we find this approximation is reasonable for values of Ng and N B 
equal to the sums of the actual ctj. 

The observed absorption lines are undoubtably drawn from a continuous line parameter 
space. Thus /(e) should actually be a function of a number of continuous variables. The 
likelihood function can be derived from the discrete line type likelihood. Let the discrete 
e p be the vector of average parameter values over a cell within the continuous parameter 
volume Ae p . Then p(L(e p ) \ fl) is the probability a line will be found within Ae p . If p(e p ) 
is the line detection probability distribution (i.e., probability per unit parameter volume) 
then eqn. (2) becomes 

p(L(e p ) | //) = p(e p )Ae p = a(e p )f p Ae p + f3(e p )Ae p (1 - f p Ae p ) (6) 

where we recognize that the probability of finding a false positive is proportional to the 
parameter volume Ae p . Next we let n t go to infinity as Ae goes to zero; p(L(e p ) \ fl) 
becomes a differential in the limiting process. 

The confusion of one line type with another is unavoidable as we pass to the continuum 
limit: rarely will our spectral fits find the exact line parameters. The discrete likelihood 
functions derived above (which can easily be generalized for a continuous parameter space) 
are not directly relevant. We therefore defer derivation of continuous likelihoods until 
continuous line distributions are determined from a much larger number of line detections 
or are proposed by theories of burst emission. 



2.2. Bayesian Formalism 
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In the Bayesian formulation of statistics, our confidence in a hypothesis' truth is 
expressed in terms of a probability (this is one of the major foundations of Bayesian 
statistics). Thus p(H | DI) is the posterior probability that hypothesis H is true given the 
data D and information I. By Bayes' Theorem (a basic relation among probabilities — Loredo 
1990) 

The probability p(D | HI) is the likelihood for H, and is the quantity from which 
"frequentist" statistical methods derive standard quantities such as x 2 - The probability 
p(D | /) is the global likelihood, the probability for the realization D under all possible 
hypotheses; this factor acts as a normalization (since we use ratios of p(H | DI), we need 
not calculate p(D | /)). Finally, p(H \ I) is the prior probability that the hypothesis is true, 
and is therefore a quantification of our expectations. Probabilities with no dependence 
on D, such as p(H \ I), occur frequently within the Bayesian methodology, and are called 
"priors." Priors for the current data set may be posterior probabilities from the evaluation 
of a different experiment or observation. 

We are concerned with the truth of H. However, the likelihood p(D | fill) in eqn. (4) is 
a function of the unknown line frequency distribution /. While / is intrinsically interesting, 
for hypothesis evaluation the value of / is necessary only to determine the more fundamental 
p(D | fill), or in Bayesian terminology, / is a "nuisance" parameter. We eliminate / by 
the Bayesian process of marginalization: integrating over all possible values, weighted by 
our prior expectation for this parameter's likely values, that is, by /'s prior. Thus 

p(D \HI) = J df(e) p(f(e) | HI) p(D | fHI) (8) 

where the integration is over each line type. 

To compare the relative probabilities that hypotheses H and H x are true, we construct 
the posterior odds ratio 

_ p(H | DI) _ p(H | /) pjp | Hpl) _ p(H | /) J dfpij | H I) p(D | fH I) 
H p(H x | DI) p{H x | /) p{D | HJ) p(H x | /) / dfp(f | HJ) p(D | fHJ) ' 1 ' 

This is the basic equation. Note that we do not have to calculate p(D | /). The likelihood 
ratio p(D | HqI)/p{D \ H X I), often called the Bayes factor B, can be calculated. With 
the current pattern of detections and nondetections, the Bayes factor will usually 
favor the hypothesis that the Ginga and BATSE observations are inconsistent, or 
that our understanding of these instruments is faulty. On the other hand, the factor 
p(H I I)/p(H x | /), the prior odds ratio, is an expression of our prior expectations of the 
relative truth of each hypothesis. As such, this factor will often be subjective; for example, 
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if H states that the BATSE detectors function as expected (i.e., are capable of detecting 
lines), and H x states that BATSE is unable to detect lines, then the prior odds quantifies 
our confidence in the BATSE detectors. Usually our conclusion regarding BATSE- Ginga 
consistency depends on our assessment of the relative values of the Bayes factor and 
the ratio of the hypothesis priors (i.e., whether the priors compensate for a Bayes factor 
unfavorable for consistency). The arbitrariness in p(H Q | I)/p(H x 1 1) makes explicit the 
subjectivity in deciding when an inconsistency exists. Clearly our threshold for accepting a 
conclusion consistent with our expectations is less stringent than for a surprising conclusion. 

We use the above likelihood function to estimate the line frequencies by calculating 
the posterior distribution for /, p(f(e) \ DHI), based on the observed realization D. This 
distribution can be derived from the BATSE, Ginga or combined datasets (i.e., by using 
different definitions of D). By Bayes' Theorem 



p(/(e) 


HI)p(D\f(e)HI) p(f(e)\ 


HI)p(D\f(e)HI) 


p(D\HI) Jdf(e)p(f(e)\ 


HI)p(D\f(e)HI) 



p(f(e)\DHI) = ^ JX " '7 ' '- = r J,\ K r, \ i tt t\ 7^TT7 6?n ■ ( 10 ) 



The line frequency prior, p(f(e) | HI), depends on H and /. For a uniform prior, 
p(f | HI) = 1, the posterior distribution for / is proportional to the probability of D as a 
function of /. 



3. HYPOTHESES 



Using the Bayesian methodology presented above in §2 for detections in two Ginga 
and no BATSE bursts, we compare the hypothesis Hq stating that there is no inconsistency 
between the Ginga and BATSE results to specific hypotheses which contradict H . In detail 
the consistency hypothesis H states that: the detection capabilities of both instruments are 
understood; lines exist; and the detection threshold has been set high enough to virtually 
eliminate false positives (i.e., we set (3=0). Each line type has its own Bayes factor, and the 
overall odds ratio is the product of the two Bayes factors and the ratio of hypothesis priors 
(the prior odds). In the following we present the Bayes factor for the single detection of a 
given line type in the Ginga data, and none in the BATSE bursts. 

First, define H 1 to be the hypothesis that BATSE is unable to detect lines, even if they 
are present. Thus we set BATSE's line detection probability a to zero for hypothesis H ± . 
Consequently the Bayes factor for the comparison of H to Hi is 

Sdf P (f\Hoi) ai f ifeq-Qfc/) n^i(i-Qm/) (n) 

I dfp(f | HJ) onf UkM 1 - a kf) 
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dfni-ff- n = 7; , ■ . (13) 



where a indexed with k refers to Ginga bursts and with m to BATSE bursts. Since this is 
the Bayes factor for one line type, the line type indices are suppressed. If for our heuristic 
calculation we set a — 1 for hypothesis H and reduce N G and N B to N' G and N' B (the 
strong bursts for which a ~ 1), then 

B = !df P {f\H Q I)f{l-f) N 'o +N '-' 1 = N G (N G + 1) 

1_ fdfp(f\H 1 I)f(l-f) N h^ - (N G + N' B )(N G + N' B + 1) ' 1 ) 

where we used uniform line frequency priors p(f \ HI) = 1 in calculating the last term; as 
will be discussed below (§4), these are formally correct priors. The analytic expressions for 
the Bayes factor in the heuristic calculations use the Beta function 

"! n\(N-n) 
Jo " J J v ^ J ' (N + 1)! 

Since n is usually or 1, and N is fairly large, small values of / dominate the integral. 
Thus, if instead of a uniform prior from to 1 we use a uniform prior from to f max , the 
dependence on n and N will be the same (to within ~ 25%), with a normalization factor 
(from the prior) of 1/ fmax- This normalization factor will appear in both the denominator 
and numerator of eqn. (12), and B\ will change by very little. 

Next, the hypothesis if 2 states there are no absorption lines. Consequently the 
reported Ginga lines must all be false positives. Thus the prior p(f | H 2 I ) = 5(f) and the 
false positive probability (5 must be nonzero (and assumed constant) for H 2 : 

B = fdf P (f\H i) aj n2s,(i-q*/) n^i(i-Qm/) (u) 
a nS(i-A) n m B =1 (i-Pm) 

For our heuristic calculation we set a = 1 for H and let (3 be nonzero for H 2 : 
B _ fdfp(f\H Q I)f(l-f) N 'c+ N '*- 1 _ 1 

2 ~ P(1-P) N 'g+ N 'b-1 - p(l-p)N' G +N' B -l( N ^ + N , B ^ + N , B + ^ ' 

(15) 

where again we evaluated the integral using a uniform line frequency prior. If we cut off 
the prior at fmax, then B 2 increases by a factor of 1/ fmax (the uniform prior only appears 
in the numerator). This Bayes factor can be minimized by maximizing (3(1 — (3) n 'g +n 'b^ 1 ; 

^ max = N' G + N' B (16) 

b = 1 ( n q +n 'b (17) 

N G + N B + l\N G + N B -l) ■ { } 

Finally we use as a generalized inconsistency hypothesis H 3 the supposition that 
the Ginga and BATSE bursts are characterized by different line frequencies. If H 3 is 
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favored we would not believe there actually are different line frequencies, but instead would 
conclude the instruments are not well understood and the line detectability probabilities are 
incorrect. Note that an error in the line detectability calculations, which can be modeled 
by hypothesis H 3 , need not imply that BATSE is unable to detect lines (hypothesis Hi). 
Differences between an instrument's true and calculated line detection capabilities can 
indeed be modeled by changes in the line frequency. The Bayes factor is 

B = fdfpjf I Hpl) aj - Okf) nga(l - a m f) flg) 

3 If df G df B p(f G I H 3 I)p(f B I H 3 I) aJ G n£ 2 (l - «*/g) 1T^i(1 ~ oc m f B ) 

where a indexed with k refers to Ginga bursts and with m to BATSE bursts. Note that 
the integral in the numerator can be viewed as the integral in the denominator with an 
extra factor of S(f G — J'b) i n the integrand. The double integral over f G and f B for H 3 
in this equation includes f B = fa, but the fraction of the fa — /b plane where f B = fa 
is infinitesimal, and therefore the case f B = fa is given no weight in the integral. For our 
heuristic calculation we set a — 1 and reduce N G and iV B to and N' B 

B = Sdfp(f\H I)f(l-frc+ N 'B-i 

3 / dfa p{fa I H 3 I) f G (l - f G ) N 'o^ f df B p(f B | H 3 I) (1 - f B )K { ] 

N G (N' G + 1)(N' B + 1) 



(N G + N' B )(N G + N' B + 1) 

where the last expression was calculated with uniform line frequency priors. If the uniform 
prior extends only to f max then B 3 decreases by a factor of f max since the prior occurs twice 
in the denominator and only once in the numerator. 

B 3 can be used to assess whether the Ginga data increased our knowledge of the 
line frequency given the BATSE results; the appropriate hypothesis priors are required to 
answer this different question. In this case B 3 evaluates the BATSE data alone using two 
different priors for /. The numerator uses a prior based on the Ginga data (the integral 
over fa in the denominator normalizes this prior) while the denominator uses a uniform 
prior (the integral over f B ). Here the posterior for / from the Ginga data is used as a prior 
for the BATSE observations. 



4. ILLUSTRATIVE CALCULATION 



A detailed calculation using the detection probabilities a and false positive probabilities 
(3 for each Ginga and BATSE burst will be presented in a subsequent paper in this series. 
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Here we present calculations using the heuristic Bayes factors (eqns. [12], [17] and [19]). 
The illustrative set of N' G and N' B for the GB880205 and GB870303 line types presented 
in Table 1 are based on more complete calculations using preliminary values of a and (3. 
Therefore by considering the Bayes factors presented here we can reach tentative conclusions 
which will most likely remain valid after our more detailed calculation. The value of N' B is 
surprisingly small given the large number of BATSE bursts which have been searched: most 
bursts were not strong enough for Ginga-\ike spectral features to be detectable. In addition, 
few BATSE spectra extend below ~ 20 keV which would enable detection of GB870303-like 
lines, and therefore we give N' B a very small value for this line type. The Ginga N' G is based 
on the calculations of Fenimore et al. (1993). 

For our primary analysis we assumed a uniform prior for the line frequencies, except 
for hypothesis H 2 which states / = 0; for all other hypotheses / can be any value between 
and 1, or p(f \ HI) = 1. Formally this prior must utilize information prior to the Ginga 
detector. As discussed in the Introduction, there is insufficient pre-Ginga information to 
calculate a line frequency, and we use the least informative line frequency prior (i.e., the 
prior with the least information content). Table 1 lists the Bayes factor for each set of 
hypothesis comparisons using the lines in GB880205 alone, the line in GB870303, and both 
line sets together (the column labeled "Joint"). Note that the posterior odds (eqn. [9]), 
which indicates the favored hypothesis, is the product of the Bayes factor (the ratio of the 
likelihoods for each hypothesis) and prior odds (the ratio of hypothesis priors) quantifying 
our expectations and knowledge before the data were obtained. Table 1 also presents the 
prior odds for each hypothesis comparison (as discussed below), and the resulting posterior 
odds. 

Because the data appear discrepant with two Ginga and no BATSE detections, we 
might intuitively expect Bayes factors less than 1, favoring the specific hypotheses regarding 
instrumental deficiencies (Hi — BATSE is unable to detect lines — and H 2 — absorption lines 
do not exist and therefore the Ginga detections must be spurious). On the other hand, the 
prior odds (the ratio of the hypothesis priors) are greater than 1, favoring our assertion 
that the instruments are understood. Based on prelaunch calibration tests and on-orbit 
performance and observations (e.g., the Her X-l pulsar spectrum — Briggs et al. 1994) we 
are confident that BATSE could detect lines if present (Teegarden et al. 1993; Band et al. 
1993a; Palmer et al. 1994a), and therefore we assign a high value (e.g., ~ 100) to the prior 
odds p(H \I)/p(Hi\I). 

The prior for hypothesis H 2 consists of the product of priors for both line nonexistence 
and spurious Ginga detections. Although line nonexistence is the fundamental statement 
of H 2 , a necessary consequence is that the claimed detections are false positives; priors are 
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required for both propositions. Before the report of the Ginga lines the confidence in the 
existence of absorption lines in GRB spectra was not very high, and therefore formally 
our prior for line existence, which should be based on pre- Ginga information, is not large. 
On the other hand, an unrealistically large value of the probability of spurious detections 
f3 minimizes the Bayes factor for the Hq to H 2 comparison; the Ginga team has studied 
their claimed detections and is confident they are not false positives (E. Fenimore 1993, 
private communication; C. Graziani 1993, private communication). The detection threshold 
has been set high enough to make the probability of a statistical false positive very small, 
and the Ginga instrument team worked hard to eliminate systematic effects which could 
produce a spurious detection. Note that a systematic effect could increase the false positive 
probability significantly for all bursts. Therefore, based on the expectations both that lines 
exist and that the false detection probability is low, we assign a high value (e.g., ~100) to 
the prior odds of H relative to H 2 . 

The Bayes factor B 3 is surprisingly close to 1 for the comparison of consistency (H ) 
and generalized inconsistency (H 3 ). Figure 1 explores the dependence of B 3 on N' B for 
N' G = 10 assuming there are no line detections in the BATSE spectra; the Bayes factor for 
multiple line types is the product of the single-detection Bayes factor. This figure shows 
the number of BATSE bursts without line-detections necessary to conclude there is an 
inconsistency (the effective value of N G is N' G ~ 10). As can be seen, for one line type 
B 3 oc l/N' B when N' B 3> N' G (see also eqn. [19]). We assume we understand our instruments 
and therefore assign prior odds favoring H over H 3 (e.g., ~ 10), although not by as large a 
factor of Hq relative to Hi or H 2 since the implications of H 3 are not as extreme as these 
other two hypotheses. Inaccuracies in our line detectability calculations, which are more 
likely than BATSE's total inability to detect lines (Hi), can be modeled as differences in 
the line frequencies. Figure 2 shows B 3 for different values of N' G ; B 3 increases with N G 
since the likely line frequency from the Ginga data alone decreases. We conclude from these 
figures that an order of magnitude more strong BATSE bursts without a line detection are 
necessary for the odds ratio to fall below unity, that is, for hypothesis H 3 to be favored. 

The surprisingly large value of the Bayes factor comparing H and H 3 results from the 
structure of the space of the likelihood function p(D | /g/b-0 as a function of the Ginga 
and BATSE line frequencies fo and J'b (eqn. [5]). The line frequencies are marginalized 
by integrating over this space. Figure 3 shows this space with logarithmic contours for 
the likelihood with N' G = 10 and N' B = 35; while this example is based on the values for 
GB880205 in Table 1, it is meant to be illustrative. Under the hypothesis H that there is 
a single line frequency f = fa = Ib the line frequency is marginalized by integrating along 
the diagonal. On the other hand, for f G 7^ J'b assumed by H 3 the integration is over the 
entire region. The peak value of the likelihood is not on the line fa — /b, yet the average 
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along this line is comparable to the average over the entire region! 

Figure 4 gives the distribution for likely values of the line frequency p(f \ DH I) from 
eqn. (10) for the Ginga and BATSE datasets alone, and for the joint dataset. Note that 
the abscissa is the logarithm of the line frequency, and therefore the areas under the curves 
are not proportional to the probability assigned these regions. As can be seen, there is 
a substantial overlap between the line frequency distributions for each instrument's data 
alone. Indeed, the distribution for / from the joint dataset is the (normalized) product of 
these two distributions. For a range of / values the Ginga detections can be a statistical 
fluctuation up and the absence of BATSE detections a fluctuation down. This explains the 
larger than expected value of p(D | H I), the probability of obtaining the data assuming 
consistency H . On the other hand, we find a small value for the probability of obtaining 
the BATSE results alone — the right hand factor of p(D | H^I), the likelihood for H 3 (the 
denominator of B 3 ). With a uniform prior for the line frequency the probability of detecting 
lines in tib bursts out of N' B searched is l/(N' B + 1), independent of ub- Therefore finding 
no lines in the BATSE data is only one of many equally likely results, hence the small value 
of its occurrence. 

As was discussed above, we have been using uniform line frequency priors between 
and 1, p(f \I) — 1, because we cannot determine dependable line occurrence rates from the 
pre- Ginga reports of line detections. Although there are many problems in assessing the 
line frequency in the KONUS bursts, we can naively use a uniform prior to f max — 0.2 (the 
line frequency KONUS reported). The resulting Bayes factors are provided in Table 1. As 
can be seen B\ changes by less than a factor of 2, while B 2 increases and B 3 decreases by 
factors of order 1/ f max ~ 5. However, our basic conclusions are unaffected. 



5. "FREQUENTIST" ANALYSIS 

Before adopting the Bayesian methodology presented here, we defined the consistency 
statistic as the probability pine > 2,ub = | Hoi) that Ginga would detect 2 or more 
lines, and BATSE none (Band et al. 1993c). This is the region in the space of all possible 
realizations where the observations would appear to be at least as discrepant as the current 
detections and nondetections. A small value was understood to indicate inconsistency 
between the Ginga and BATSE results. Frequentist calculations consider how likely the 
data are for a given hypothesis. However, the probability of obtaining the observed data 
may be vanishingly small if there are many possible outcomes (for example, observing 
a particular value of a continuous variable), and therefore the probability is calculated 
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for a region bounded by the observations. By working with only one hypothesis, we do 
not know whether the observations are any more likely for any reasonable alternative to 
that hypothesis. On the other hand, Bayesian statistics compares hypotheses using the 
probabilities of obtaining the observed outcome under the hypotheses without regard for the 
magnitude of these probabilities. However, our lack of imagination in devising reasonable 
alternative hypotheses may lull us into complacency if we find odds ratios favoring the null 
hypothesis (here the consistency hypothesis H ) over unlikely hypotheses. It is therefore 
instructive to consider our frequentist consistency measure. 

Since the Ginga lines are actually single detections of two very different line types, we 
calculate the product of the probabilities of one or more detections of two types. For a 
single line type this probability is 

N B I N G 

P(n G >l,n B = 0\f,N G ,N B )= H(l-a m f-(3 m (l-f)) ll - J[ (1 - a k f - (3 k (l - f)) 

m=l \ k=l 

(20) 

As with the Bayesian analysis we simplify this expression to see the dependencies on the 
number of bursts. Thus we set a — 1 and (3 = 0, and use the effective number of bursts N' B 
and N' G : 

P(n G >l,n B = 0\f,N G ,N' B ) = (l-f) N ' B (l-(l-ffo) . (21) 

These expressions are functions of the unknown line frequency /. By maximizing this 
probability with respect to / we establish an upper limit for consistency. The line frequency 
which maximizes the probability in eqn. (21) is / = 1 - (N' B /(N' B + N' g )) 1 / n g, giving 

/ N i \N B /N G / m> \ 

P^(na >!,»* = 0| /.M^^^j [jjrfj^ . ( 22) 

Table 2 lists this probability evaluated for the values of the N' G and N' B in Table I. As 
can be seen, maximizing the probability with respect to / gives an upper limit of 3% that 
Ginga and BATSE will appear as discrepant if lines exist and the detectors function as 
understood. 

An alternative consistency measure is the probability that all the detections 
would be in the Ginga bursts given a set number of detections, the product of 
P(n G — 1, n B — | n G + n B — 1, N G , N B ) for each line type (Palmer et al. 1994a). This 
probability for one line type, assuming a = or 1, is 

P(n G = 1, n B = 1 n G + n B = 1, N' G , N' B ) = G . (23) 

G ' B 

Table 2 presents this consistency measure evaluated for our illustrative example; there is a 
13% probability that both detections would be in the Ginga bursts, which would hardly be 
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considered a discrepancy. This measure is more favorable for consistency than the previous 
one (eqn. [22]) because it assumes there is only a single detection of a given line type, 
thereby restricting the space from which our observed result is drawn. 

Note that these two frequentist consistency measures test whether the detections of a 
given line type in one instrument but not another constitutes a discrepancy between these 
two instruments, but not whether finding all the detections in the same instrument is a 
discrepancy. Indeed, these two measures would have been smaller had the GB870303 line 
been detected by BATSE and not by Ginga). Yet in that case we would not worry about 
a discrepancy between BATSE and Ginga. It is only when we compare consistency and 
inconsistency hypotheses that we test explicitly whether the instruments are discrepant. 



6. DISCUSSION 



The primary purpose of this paper is the development of a methodology to compare the 
Ginga and BATSE observations; the calculation of the actual detection and false positive 
probabilities (the a and (3 quantities in the above equations) will be presented later in this 
series. Nonetheless, the example we used is a reasonable approximation to the observations, 
and its analysis indicates the likely results of a more accurate calculation. The frequentist 
consistency statistic P{nc > 2,ns = | f,Nc,Ns) indicates that the detection of at least 
two line features in the Ginga bursts, and none in the BATSE bursts, is fairly improbable 
(the probability has an upper limit of ~ 3%), but not unlikely enough to conclude there is an 
inconsistency. In addition, the probability that the two detections would both be found in 
the Ginga bursts is 13%, which is not small enough to indicate there is a discrepancy. From 
our Bayesian odds ratios we infer that the quantitative analysis of the data (represented 
by the Bayes factors) is insufficient to shake our confidence in our understanding of the 
two detectors. Conversely, the Bayes factors do not prove conclusively that there is not 
a discrepancy; the data do not rule out a serious deficiency in the capabilities of either 
instrument, or in the analysis and interpretation of their observations. Therefore we 
continue to test BATSE's line-detecting capability, and to study issues such as the false 
positive probability. 

Bayesian inference has been faulted for the uncertainty as to the correct prior, and 
indeed in the calculations we present in Tables 1 we use two different priors for the line 
frequency. However, it should be noted that the basic conclusions are the same. As stated 
above in §4, the uniform prior between / = and 1 is formally correct in not using the 
BATSE or Ginga data, and is also the most conservative in not attempting a quantitative 
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estimate of the line frequency from the Konus data. Therefore, the determination of the 
line frequency prior does not introduce any ambiguity into our conclusions. Note that our 
conclusions do depend on the hypothesis priors, the quantification of the confidence in the 
analysis of the Ginga and BATSE spectra. 

BATSE observes strong bursts within which lines are detectable at a low enough rate 
that it is unlikely the statistical analysis presented here will lead us to conclude there is 
an inconsistency in the near future. Figures 1 and 2 indicate that many more than 100 
strong BATSE bursts would be necessary to conclude the Ginga and BATSE bursts are 
characterized by different line frequencies (or alternatively, the line detection rate is very 
different than calculated). Similarly, a much larger number of BATSE bursts is necessary 
for the Bayesian odds ratios comparing consistency and specific instrumental deficiency 
hypotheses to convince us there is a discrepancy. Therefore, in the near term the continued 
absence of BATSE detections will merely lower our estimate of the line frequency. 

A major but unavoidable deficiency of our analysis is that we approximate the 
continuous line distribution by the small number of lines detected. Our comparison of the 
two datasets is necessarily plagued by uncertainty concerning the line distributions. Of 
course, sufficient line detections to determine these distributions would prove definitively 
the existence of absorption lines in burst spectra, making irrelevant the statistical analysis 
of the possible discrepancy between Ginga and BATSE, and permitting the more satisfying 
study of an important burst phenomenon. 



7. SUMMARY 

We adopted a new Bayesian methodology to determine whether the two Ginga 
detections of absorption lines and the absence of any BATSE detections are inconsistent. 
This methodology permits us to compare specific hypotheses through an odds ratio which 
is the product of a quantitative Bayes factor, the ratio of the probabilities of obtaining 
the observations given the hypotheses, and a more subjective factor quantifying our prior 
expectations. 

The definitive application of this methodology to the BATSE and Ginga data requires 
detailed information on the bursts and line detectability, and will be presented later in this 
series of papers. However, we can draw tentative conclusions based on an approximate 
calculation. We find that the Bayes factors favor hypotheses that the understanding of 
the Ginga and BATSE detectors are defficient, but not by large enough factors to exceed 
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our confidence in the understanding of these instruments. Similarly, the Bayes factor is 
inconclusive for a comparison of the hypotheses that the Ginga and BATSE bursts are 
characterized by the same or different line frequencies. Thus given the tests to which 
the Ginga and BATSE instruments have been subjected, our Bayesian methodology 
leads us to conclude that the two instruments are not discrepant. In addition, the 
non-Bayesian consistency probabilities are not small enough to lead us to conclude there is 
an inconsistency. 

We thank the referee, Tom Loredo, for his insightful (and copious) comments which 
have improved this paper both in content and clarity. The BATSE instrument team effort 
at UCSD is supported by NASA contract NAS8-36081. 
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Table 1: Illustrative Bayesian Calculation 





GB880205 GB870303 


Joint 


N' G 


10 15 




N' b 


35 10 




fimax 


1/45 1/25 


1/35 




Bayes Factors 


Prior Odds Posterior Odds 



Uniform Line Frequency Prior 
H : Hi 0.0531 

4.83X1CT 4 



Hq : H 2 
H : H 2 , (3 
Ho : H 3 



/3(l-/3) 44 

0.0584 
1.91 



0.369 

1.54xl0~ 3 
/3(l-/3) 24 

0.102 
4.06 



0.0196 

7.43xl0~ 7 

0.00653 
7.75 



100 
100 
100 
> 10 



~ 2 

7xl0~ 5 

2 (1-/3)68 

~ 0.7 

~ 80 



Uniform Line Frequency Prior 0-0.2 

H : H 1 0.0784 0.420 

tt . tt 2.41X10" 3 7.52X10" 3 

-"0 • tl 2 

Ho : H 2 , (3 = (3„ 
Hq : H% 



/3(l-/3) 44 

0.292 
0.56 



/3(l-/3) 24 

0.501 
1.01 



0.0329 

1.81 xlO" 5 
/3 2 (l-/3) 68 

0.160 
0.57 



100 
100 
100 
' 10 



~ 3 

~2xl0~ 3 
/3 2 (l-/3) 68 

~ 20 
~ 6 



Note. - 

Nq — number of Ginga bursts in which lines are detectable 
N' B — number of BATSE bursts in which lines are detectable 
Pmax — the false positive probability which minimizes B 2 
H a : H x — Comparison of hypotheses H and H X1 where: 
H — Ginga and BATSE are consistent 
Hi — BATSE is unable to detect lines 

H 2 — lines do not exist and thus the Ginga detections are spurious 

H3 — different line frequencies characterize the BATSE and Ginga bursts 



-23- 



Table 2: Illustrative Frequentist Consistency Statistics 







GB880205 


GB870303 


Joint 


N' G 




10 


15 




N'b 




35 


10 




Pmax(n G > l,n B = 


: | /, N' G N' B ) 


9.2 x 10~ 2 


3.3 x lO" 1 


3.0 x 10~ 2 


P(n G = l,n B = | 


n G + n B = 1,N' G N' B ) 


2.2 x 10- 1 


6.0 x 10- 1 


1.3 x 10" 1 



-24- 



Figures 

Figure I. Bayes factor B 5 vs. number of BATSE bursts N' B without a line detection 
for a single detection in N' G = 10 Ginga bursts. Shown are curves for one (solid), two 
(short dashes) and three (long dashes) different line types. The Bayes factor compares 
the hypothesis H that Ginga and BATSE are consistent to the generalized inconsistency 
hypothesis H 3 that the Ginga and BATSE bursts are characterized by different line 
frequencies. A uniform prior probability was used for the line frequencies. Lines are 
assumed to be detectable if present. 

Figure 2. Bayes factor B% vs. number of BATSE bursts N' B for a single detection in 
Nq = 5 (solid curve), 10 (short dashes), 15 (long dashes) and 20 (dot-dot-dash) Ginga 
bursts. The illustrative values used in Table 1 use N' G = 10 for GB880205 and N' G = 15 for 
GB870303. 

Figure 3. The probability (eqn. [5]) of one line detection in N' G = 10 Ginga bursts and 
no detections in N' B = 35 BATSE bursts in fa — Jb space, fa and fs are the Ginga and 
BATSE line frequencies, allowed to be different. Logarithmic contours spaced factors of 
100 apart are used; the maximum occurs at fa — 0.1, fs — 0. The line frequency for Hq is 
marginalized by integrating along the diagonal fa = Jb while the Ginga and BATSE line 
frequencies fa and fs are marginalized for H 3 by integrating over the entire region. 

Figure 4. Normalized distributions of line frequencies for one Ginga detection out of 
N G = 10 bursts, and no BATSE detections out of N' B = 35 bursts. Shown are distributions 
based on the BATSE (long dashes), Ginga (solid curve) and combined (short dashes) 
datasets. Note that the abscissa is logarithmic, and therefore areas are not proportional to 
the probabilities assigned to different regions. 



